Search CORE

154 research outputs found

On Equivalence and Cores for Incomplete Databases in Open and Closed Worlds

Author: Forssell Henrik
Kharlamov Evgeny
Thorstensen Evgenij
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Database Theory (ICDT 2020)
Publication date: 01/01/2020
Field of study

Data exchange heavily relies on the notion of incomplete database instances. Several semantics for such instances have been proposed and include open (OWA), closed (CWA), and open-closed (OCWA) world. For all these semantics important questions are: whether one incomplete instance semantically implies another; when two are semantically equivalent; and whether a smaller or smallest semantically equivalent instance exists. For OWA and CWA these questions are fully answered. For several variants of OCWA, however, they remain open. In this work we adress these questions for Closed Powerset semantics and the OCWA semantics of [Leonid Libkin and Cristina Sirangelo, 2011]. We define a new OCWA semantics, called OCWA*, in terms of homomorphic covers that subsumes both semantics, and characterize semantic implication and equivalence in terms of such covers. This characterization yields a guess-and-check algorithm to decide equivalence, and shows that the problem is NP-complete. For the minimization problem we show that for several common notions of minimality there is in general no unique minimal equivalent instance for Closed Powerset semantics, and consequently not for the more expressive OCWA* either. However, for Closed Powerset semantics we show that one can find, for any incomplete database, a unique finite set of its subinstances which are subinstances (up to renaming of nulls) of all instances semantically equivalent to the original incomplete one. We study properties of this set, and extend the analysis to OCWA*

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

NORA - Norwegian Open Research Archives

Answering Queries using Views over Probabilistic XML: Complexity and Tractability

Author: Cautis Bogdan
Kharlamov Evgeny
Publication venue
Publication date: 01/01/2012
Field of study

We study the complexity of query answering using views in a probabilistic XML setting, identifying large classes of XPath queries -- with child and descendant navigation and predicates -- for which there are efficient (PTime) algorithms. We consider this problem under the two possible semantics for XML query results: with persistent node identifiers and in their absence. Accordingly, we consider rewritings that can exploit a single view, by means of compensation, and rewritings that can use multiple views, by means of intersection. Since in a probabilistic setting queries return answers with probabilities, the problem of rewriting goes beyond the classic one of retrieving XML answers from views. For both semantics of XML queries, we show that, even when XML answers can be retrieved from views, their probabilities may not be computable. For rewritings that use only compensation, we describe a PTime decision procedure, based on easily verifiable criteria that distinguish between the feasible cases -- when probabilistic XML results are computable -- and the unfeasible ones. For rewritings that can use multiple views, with compensation and intersection, we identify the most permissive conditions that make probabilistic rewriting feasible, and we describe an algorithm that is sound in general, and becomes complete under fairly permissive restrictions, running in PTime modulo worst-case exponential time equivalence tests. This is the best we can hope for since intersection makes query equivalence intractable already over deterministic data. Our algorithm runs in PTime whenever deterministic rewritings can be found in PTime.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Increasing environmental compatibility of metal production

Author: Kharlamov Evgeny
Sharapov Rashid
Yadykina Valentina
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

Building materials production generates a large amount of harmful substances poisoning the atmosphere. One of the major sources polluting cities environment is metallurgical industry. Concentration is one of the most important processes where empty components are extracted from the rock. During ore concentration, an increasing number of man-made wastes are generated; they pollute the air and huge area around the factories discharging these wastes. This reduces both space for people to live and place for cities to function and develop. It should be noted that metal production enterprises have accumulated billions of tons of industrial wastes (tailings) that include a large amount of iron-containing materials and rocks; these can be used as building materials, for example, when preparing fine-grained concrete as a mineral powder as well as in construction of roads, houses, in paint production, etc

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

Towards Ontology Reshaping for KG Generation with User-in-the-Loop: Applied to Bosch Welding

Author: Chen Jieying
Cheng Gong
Kharlamov Evgeny
Kostylev Egor V.
Zhou Baifan
Zhou Dongzhuoran
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/09/2022
Field of study

Knowledge graphs (KG) are used in a wide range of applications. The automation of KG generation is very desired due to the data volume and variety in industries. One important approach of KG generation is to map the raw data to a given KG schema, namely a domain ontology, and construct the entities and properties according to the ontology. However, the automatic generation of such ontology is demanding and existing solutions are often not satisfactory. An important challenge is a trade-off between two principles of ontology engineering: knowledge-orientation and data-orientation. The former one prescribes that an ontology should model the general knowledge of a domain, while the latter one emphasises on reflecting the data specificities to ensure good usability. We address this challenge by our method of ontology reshaping, which automates the process of converting a given domain ontology to a smaller ontology that serves as the KG schema. The domain ontology can be designed to be knowledge-oriented and the KG schema covers the data specificities. In addition, our approach allows the option of including user preferences in the loop. We demonstrate our on-going research on ontology reshaping and present an evaluation using real industrial data, with promising results

arXiv.org e-Print Archive

GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner

Author: Cen Yukuo
Dong Yuxiao
He Yufei
Hou Zhenyu
Kharlamov Evgeny
Liu Xiao
Tang Jie
Publication venue
Publication date: 10/04/2023
Field of study

Graph self-supervised learning (SSL), including contrastive and generative approaches, offers great potential to address the fundamental challenge of label scarcity in real-world graph data. Among both sets of graph SSL techniques, the masked graph autoencoders (e.g., GraphMAE)--one type of generative method--have recently produced promising results. The idea behind this is to reconstruct the node features (or structures)--that are randomly masked from the input--with the autoencoder architecture. However, the performance of masked feature reconstruction naturally relies on the discriminability of the input features and is usually vulnerable to disturbance in the features. In this paper, we present a masked self-supervised learning framework GraphMAE2 with the goal of overcoming this issue. The idea is to impose regularization on feature reconstruction for graph SSL. Specifically, we design the strategies of multi-view random re-mask decoding and latent representation prediction to regularize the feature reconstruction. The multi-view random re-mask decoding is to introduce randomness into reconstruction in the feature space, while the latent representation prediction is to enforce the reconstruction in the embedding space. Extensive experiments show that GraphMAE2 can consistently generate top results on various public datasets, including at least 2.45% improvements over state-of-the-art baselines on ogbn-Papers100M with 111M nodes and 1.6B edges.Comment: Accepted to WWW'2

arXiv.org e-Print Archive